Skip to content

BUG/TST: added TypeError if object dtypes are detected in dataframe #61682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sharkipelago
Copy link
Contributor

@sharkipelago sharkipelago commented Jun 20, 2025

This PR addresses concern 1 of #55114 - Having consistent behavior with Series.round & DataFrame.round.

My solution was to raise a TypeError in a way similar way to #61206.

I changed the following existing tests, but a little worried that might break some things, so any feedback is appreciated.

  1. I deleted tests/frrame/methods/test_round.py's test_round_mixed_type as I felt that test conflicted with the current intended behavior of DataFrame.round
  2. I edited tests/copy_view/test_methods.py's test_round as it was using a dataframe with strings and ints in its test

@sharkipelago
Copy link
Contributor Author

sharkipelago commented Jun 20, 2025

The second concern of the issue #55114 was round() did not work for Series or Dataframe when using decimal.Decimal from python's default decimal module. It seems like this is because both NumPy and pandas give array-like structures of decimal.Decimal objects a dtype of object. When I tested np.array([decimal.Decimal("1.2234242333")]).round(), it raised an error.

If Series.round() and DataFrame.round() not raising an error on decimal.Decimal objects is still wanted, I thought a clean solution would be to make a new custom dtype for decimal.Decimal. However, that seemed like a pretty big change so wanted to check if there was another way I should be thinking about this bugfix.

@sharkipelago sharkipelago changed the title BUG/TST: added TypeError if object dtypes are dtected in dataframe BUG/TST: added TypeError if object dtypes are detected in dataframe Jun 20, 2025
@simonjayhawkins simonjayhawkins added Bug DataFrame DataFrame data structure labels Jun 25, 2025
@jbrockmendel
Copy link
Member

i would expect this to attempt to operate pointwise (which would still raise on e.g. strings)

@sharkipelago
Copy link
Contributor Author

i would expect this to attempt to operate pointwise (which would still raise on e.g. strings)

Do you mean I should rewrite the code so that it attempts to round every column individually and then raise if there is a non-numeric column? As opposed to looking at the self.dtypes.values ?

@sharkipelago
Copy link
Contributor Author

sharkipelago commented Jun 30, 2025

Ohh because StringDtype also exists? and other non-numeric dtypes outside of object? Could I use pandas.api.types.is_numeric_dtype?

@jbrockmendel
Copy link
Member

and other non-numeric dtypes outside of object?

im specifically thinking of object dtype columns containing numeric entries

@sharkipelago
Copy link
Contributor Author

Ah okay, makes sense.

I think the current behavior for series.round() is to raise when an object dtype column containing numeric entries is called though.

The test_round_dtype_object() test in pandas/tests/series/methods/test_round.py is this i think:

  def test_round_dtype_object(self):
        # GH#61206
        ser = Series([0.2], dtype="object")
        msg = "Expected numeric dtype, got object instead."
        with pytest.raises(TypeError, match=msg):
            ser.round()

Should I submit a PR to change this behavior first before implementing your pointwise solution?

@sharkipelago
Copy link
Contributor Author

@jbrockmendel Hi, no worries if too busy to look into this right now just curious if you had any insight on the above comment for making a different PR first, thanks!

Copy link
Contributor

github-actions bot commented Aug 7, 2025

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 7, 2025
@@ -11227,7 +11227,10 @@ def _series_round(ser: Series, decimals: int) -> Series:
return ser

nv.validate_round(args, kwargs)

if "object" in self.dtypes.values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing to do is operate pointwise on object columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, you are saying if there is an object column present in the data frame, we should first try to round just that column and see if an error is raised? Because an object columns intended behavior here would be to raise on non-numeric entries and not raise numeric entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I guess I'm a little confused what operate pointwise means in this context because I think the current behavior of Series.round is to raise if the dtype is object regardless if the entries are numeric. And because the goal of the PR is to make Series.round and frame DataFrame.round consistent then shouldn't we also raise if any of the columns are of dtype object? Sorry if I'm super complicating this, just trying to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And because the goal of the PR is to make Series.round and frame DataFrame.round consistent then shouldn't we also raise if any of the columns are of dtype object?

I'm saying the series version should behave like series.map(round)

Copy link
Contributor Author

@sharkipelago sharkipelago Aug 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay, so it should behave like series.map(round) for object columns but every other dtype should continue to do a non-pointwise operation.

So like for

ser = pd.Series([1.22, 3.33, np.nan], dtype="float64")

round(ser) should continue to not raise an error even though round(ser[2]) does raise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug DataFrame DataFrame data structure Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: round not functioning as expected
3 participants